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ABSTRACT 



A meta-analysis was conducted to determine whether there 
were differences between the assigned academic achievement levels of students 
who were assessed with traditional methods of assessment and those who were 
assessed with alternative methods. From the more than 800 studies identified 
through literature searches, 7 studies, with a total of 5,020 student 
achievement levels and 15 effect sizes, were selected for the analysis. 
Findings suggest that efforts to compare the effectiveness of traditional and 
alternative assessment on academic achievement may be exercises in futility 
since there was no consensual agreement on the meaning of the term "academic 
achievement" and there were different connotations for "reliability." 

However, currently available data suggest a very small, if not trivial, gain 
for the use of alternative assessment procedures, and given how costly these 
procedures are, the benefits may not outweigh the costs. (Contains 1 table 
and 22 references.) (SLD) 
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INTRODUCTION 

Traditional assessments are currently under a siege of criticism from 
proponents of authentic" assessment, who insist that the objectivity of 
paper and pencil tests is incongruent with contemporary classroom in- 
struction (Shepard et al., 1996). Moreover, these critics argue that objec- 
tive tests prohibit the measurement of higher-order thinking skills (Hasit 
& DiObilda, 1996). Linn and Gronlund (1995), on the other hand, have 
provided detailed examples of the effectiveness of objective tests in the 
measurement of such skills. Further, Brennan and Johnson (1995) also 
point out that even though "the 'authentic nature' of performance assess- 
ments is quite appealing," it should be remembered that "the realism of 
performance assessments comes at the cost of limitations in the 
generalizability of results." In agreement, Phillips (1993) criticizes alter- 
native assessments because of their "lack of generalizability from selected 
tasks to the domain of interest." Hence, it appears that the external valid- 
ity of alternative, or performance, assessments is in question. 

The reliability of performance assessments is also under scrutiny, as 
seen in Willson's (1991) insistence that this methodology "cannot ignore 
fundamental psychometric principles of reliability . . .," and Brennan and 
Johnson (1995) warn that these assessments "raise a host of technical 
problems that must be faced if annual performance assessments are to 
yield comparable results from year to year." Nevertheless, Hirsch (1996) 
reports that advocates of performance assessment proclaim that such as- 
sessments are superior to objective tests because they are more informa- 
tive and motivational, and are also fairer to minorities and nonverbal 



students. Concurring, Meisels and Dorfman (1995) assert that mi- 
norities -- especially African Americans - "have not fared will under the 
domination of multiple choice examinations." And Lanaer et al (1990) 
resound that multiple choice tests measure only recognition and retention 
while a teraative assessment measures the thinking curriculum." More- 
over. Willson (1991) echoes that the perceived weakness of the multiple 
choice test in assessing higher-order thinking skills has necessitated the 
development of writing samples "for many state assessments." 

The findings of Davis and Felknor (1994), however, disclose that a 
majority of the students opposes alternative assessments, and only a few 
feel that these assessments are motivating. Also, Dorfman and Steele (19951 
point out that the National Assessment of Education Progress has revealed 
“ a the mean ^fferences between blacks and whites on "the extended- 
response essays" exceed those differences "found on the multiple choice 
reading assessment." Then, in response to the reported deficiencies of 

noo?i 1Ve exa ™ natI0ns in measuring complex thinking skills, Phillips 
(1993) reminds us that Forsyth (1976) has provided extensive examples 
of the capacity of objective items to measure higher-order thinking pro- 
cesses; and Phillips (1993) also reminds that Mehrens (1990) has ob- 
served that even cognitive psychologists warn against the widespread use 

of alternative assessment until such theories are documented bv extensive 
research. ' 



Obviously, proponents of both traditional and alternative assessments 
are insistent that their respective methodologies are more conducive to the 
academic achievement of America's students. Presently, however, there is 
no clear consensus favoring either method. Hopefully, this studv will pro- 
vide broader findings that will help resolve the current dilemma. 



STATEMENT OF THE PROBLEM 

The enhancement of academic achievement in America's schools was 
the underlying impetus for performing this meta-analysis on all suitable 
research that has compared traditional with alternative assessments In 
compliance with the previously stated purpose of the investigation this 
is 

question: 



Are there differences between the assigned academic achieve- 
ment levels of students who were assessed with traditional 
methods of assessment, and those who were assessed with al- 
ternative methods? 



METHODOLOGY 

The meta-analytic approach used in this study follows the procedure 
developed by Glass et al. ( 198 1). More specifically, this approach to meta- 
analysis requires the following: (a) locating studies through unbiased and 
replicable data searches, (b) selecting studies based on predetermined cri- 
teria, (c) describing each study's outcomes and then creating a common 
scale (effect size), (d) using statistical methods to quantify a specific con- 
clusion from a mixed set of results. Fundamentally, meta-analysis is a 
quantitative application of empirical deduction that would have been im- 
possible through any other previously known methodology (Gall et al., 
1996). 

Locating of documents. The studies examined in this research were 
selected from a computer search of the databases ERIC (1966-March 
1999), Dissertation Abstracts (1861-August 1997), and PsychLit (1974- 
September 1997). These databases were searched with the keywords "al- 
ternative assessment," "traditional assessmet," "evaluation," and "achieve- 
ment," which identified over 800 studies to be reviewed for inclusion in 
the meta-analysis. The included studies met the following predetermined 
criteria: 

1 . they were conducted in an educational setting; 

2. they included quantitative results in which academic achievement was 
identified by the author(s) as the dependent variable, and the assess- 
ment methodology was the independent variable; 

3 . they had experimental, quasi-experimental, or correlational research 
designs; 

4. the sample sizes had a combined minimum of 20 students in the ex- 
perimental and control groups; 
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all academic achievement was reported as interval data; 

6. had sufficient statistical data to calculate an effect size 

Coding of the variables. Traditional assessments, for the most part 
consist of paper and pencil objective and essay examinations, whereas 
alternative assessments encompass the evaluation of students reflective 
journal writing, group projects, self-assessments, slide shows, oral pre- 
sentations, writing samples, and so on. Basically, academic achievement 
is defined as teacher-assigned grades or as student scores on standardized 
ests. However, all academic achievement included in this meta-analysis 
was reported in terms of interval data. 

Seven of more than 800 relevant publications met the prearranzed 
criteria for inclusion in the meta-analysis, whereas those studies that were 
rejected did not meet each of the six criteria necessary for incorporation 
into the study. Generally, those studies not meeting the six prerequisite 
cntena did not employ statistical analyses. Moreover, if astudv employs 
multiple dependent variables as if they were separate entities, Glass (1981) 
posits that calculating the multiple effect sizes from such a study is an 
acceptable procedure for calculating average effect sizes, thus sanction- 
ing the presence of multiple independent comparisons in independent re- 
search articles. In compliance with Glass's theoretical methodology of 
meta-analysis, this study was able to disclose 15 effect sizes from the 
seven studies examined in its meta-analysis. 



ANALYSIS 

As previously mentioned, the data were analyzed through a meta- 
analytic technique, which relies heavily on the calculation of effect sizes 
for estabhshmg statistical meaning (Wolf, 1986). According to Glass et 

1 V’ effeCt S1Ze 1S the degree t0 which a phenomenon is present in 
the population of the study. In meta-analysis (Wolf, 1986), effect size is 

calculated to determine the presence of a statistical difference between 
mean standard deviation units ( SD ). 
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meta-analysis 

Seven studies with a total of 5020 student achievement levels and 15 
effect sizes generating 15 conclusions met the predetermined criteria for 
incorporation into the meta- analysis. The individual sample sizes ranged 
from 25 to 1381, and the mean sample size was 335. Table 1 displaysdie 
author(s), date, sample size, standard unweighted mean effect size, and 
standard error for each of the included studies. 



Table 1. Date, Sample Size, and Effect Sizes 



Author(s) 

Joyce, et al. 

Laesch, et al. 
Macciomei, N. R. 
Macciomei, N. R. 
Macciomei, N. R. 
Satumeli, et al. 
Seda-Santana, et al. 
Seda-Santana, et al. 
*Shepard, et al. 
*Shepard, et al. 
*Shepard, et al. 
*Shepard, et al. 
*Shepard, et al. 
*Shepard, et al. 
Slater, et al. 



Date 


n 


1988 


286 


1987 


30 


1995 


46 


1995 


46 


1995 


46 


1995 


1381 


1988 


28 


1988 


25 


1996 


500 


1996 


498 


1996 


496 


1996 


533 


1996 


536 


1996 


534 


1995 


35 



ES 


SE 


-0.212 


0.006 


0.517 


0.064 


0.031 


0.041 


-0.071 


0.041 


-0.028 


0.041 


1.186 


0.001 


0.561 


0.070 


0.200 


0.080 


-0.102 


0.004 


-0.521 


0.004 


-0.171 


0.004 


-0.034 


0.003 


0.101 


0.003 


-0.171 


0.003 


-0.184 


0.050 



Estimated effect size calculations are based on pooled, within-school 
standard deviations. 



MEAN EFFECT SIZES 

An overall mean effect size was also computed from the 15 calculated 
effect sizes. The sum of the 15 effect sizes is 0.900, and the mean 
unweighted effect size was 0.060, with a standard error of 0 030 which is 
positive, thus indicating that higher achievement levels were attained bv 
those students who were assessed with alternative as opposed to tradi- 
tional methodology. In addition, an average weighted unbiased estimate 
of effect size (ESJ ot 0.168 was calculated. However, Cohen (1977) das- 
sines this effect as less than small. Perhaps even more important the 
study of Satumelli et al. (1995), as depicted in Table 1. included the ex- 
amination of 1381 subjects in arriving at an effect size of 1.186, which 
was obviously resultant in a positive mean effect size for the total me'a- 
analysis. However, this large positive effect size was offset bv the six 
negative effect sizes disclosed by Shepard et al. (1996). 

e Nevertheless, Wolfs ( 1 986) interpretation of average unweighted ef- 
iect in units for the comparison between traditional assessment meth- 
odology and alternative assessment methodology indicates that the aver- 
age student exposed to alternative assessment methodology exceeded 52 4% 
of those students who were exposed to traditional assessments. Moreover, 
on the basis of an average unweighted and unbiased estimate of effect 
size, the typical student moved from the 50th percentile to the 5? 4th per- 
centile when exposed to alternative assessments. Again, however any in- 
terpretation of these results should be tempered by an awareness of 
Satumelli et al. s (1995) unusually large positive effect size, which was 
most instrumental in the comparatively higher academic achievement of 
the groups receiving alternative as opposed to traditional assessments 
But the research of Shepard et al. (1996) possibly softened this effect. 



DISCUSSION 

n a S mentI0ned ’ Cohen’s (1977) classification of the mean effect size of 
0.060 as less than small is reinforced by Wolfs (1986) indication that: ( 1 ) 
students receiving alternative assessment exceeded 52.4% of those receiv- 
ing traditional assessment; and (2) a typical student moves from the 50th 
to the 52.4th percentile when assessed by alternative methodology. How- 



ever, given the nature of percentiles, this is a very small, and perhaps 
trivial, difference. Moreover, these conclusions are somewhat contami- 
nated by the encompassing nomenclature of "academic achievement." More 
specifically, even teacher-assigned grades that are based on objective test 
scores differ, as does student performance on separate standardized tests. 
Then, when teacher evaluations of debatable academic performances such 
as cooperative learning projects, skits, self-assessments, and reflective 
journal writings are the basis of student grades, the definition of "aca- 
demic achievement" becomes further obscured. 

Although the effect sizes of the Satumelli et al. data are positive, and 
those of Shepard et al. are negative, the two data sets share distinct com- 
monalities. Both included large sample sizes (n = 1 38 1 ; average n 6 = 5 1 6), 
both were conducted in urban elementary school settings (New York, Den- 
ver), and both contained relatively high minority representations (43%, 
41 %). However, the two differ with respect to instructional methodology 
and subject areas in which academic achievement was assessed. Specifi- 
cally, the data of Satumelli et al. involved the assessment of science, 
whereas those of Shepard et al. involved the assessment of pupil perfor- 
mance in reading and mathematics. 

Possibly, the extreme difference between the effect sizes of the two 
data sets lies in the differing teaching methodologies. The positive effect 
size of the satumelli et al. data involved the teaching of science per se, 
probably by traditional methodology. Hence, singular efforts were focused 
solely on science instruction, rather than on science in conjunction with 
another academic subject. However, the negative effect sizes of Shepard 
et al.'s data may well have involved the simultaneous teaching of reading 
and mathematics, reflecting the methodology of "whole language 
constructivism." 

Perhaps, teaching a basically quantitative subject in conjunction with 
a qualitative process such as reading could obstruct maximum achieve- 
ment in both areas, much as a child's simultaneous study of two different 
languages could restrict her optimal learning of each language, as op- 
posed to studying the two separately. Then, possibly compounding the 
situation, minority students are reported to score comparatively lower on 
alternative assessments (Davis et al., 1994). However, it must be empha- 
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sized that these explanations are simply conjectural, and obviously sub- 
ject to further research. 



CONCLUSIONS AND RECOMMENDATIONS 
FOR FURTHER RESEARCH 

It may be that efforts to compare the effectiveness of traditional and 
alternative assessment on academic achievement are "exercises in futil- 
ity." Initially, there is no consensual agreement between proponents of the 
two assessment methodologies on the term "academic achievement." Fur- 
thermore, since reliability has differing connotations for quantitative 
and qualitative methodologists, a legitimate comparison becomes even 
more questionable. 

Yet, further comparisons are definitely needed, and it is the opinion 
here that such comparisons are indeed both possible and necessary. How- 
ever, given the currently available data, only very small (if not trivial) 
gains, at best, can be attributed to the use of alternate assessment proce- 
dures; and given how labor-intensive these procedures are, the benefits do 
not necessarily outweigh -the costs. Nevertheless, it seems that if recipi- 
ents of alternative assessments were allowed to engage in traditional as- 
sessment procedures for a one-to-two-week period, then equivalent grounds 
for a comparison on the basis of objective measurement could be estab- 
lished. Granted, it is acknowledged that since all school performance is 
not academic, objective measurement is not always possible. Neverthe- 
less, it would appear that such a proposal could provide for an authentic 
academic, if not affective, comparison between traditional and alternative 
assessment. In any event, this procedure could be conducive to the further 
enhancement of assessment in contemporary American schools. 
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